VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

نویسندگان

چکیده

Abstract The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines challenges class-imbalanced streams concept drift. In this paper, we propose very fast continuous synthetic minority oversampling technique ( VFC - SMOTE ). It novel meta-strategy to be prepended any streaming machine classification algorithm aiming at using new version Smote Borderline inspired by Data Sketching. We benchmarked pipelines on real containing different drifts, levels, distributions. bring statistical evidence learn models whose performances better than state-of-the-art. Moreover, analyze time/memory consumption drift recovery speed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique

The problem of imbalanced data, i.e., when the class labels are unequally distributed, is encountered in many real-life application, e.g., credit scoring, medical diagnostics. Various approaches aimed at dealing with the imbalanced data have been proposed. One of the most well known data pre-processing method is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE may generate ...

متن کامل

SMOTE: Synthetic Minority Over-sampling Technique

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of “normal” examples with only a small percentage of “abnormal” or “interesting” examples. It is also the case that the cost of misclassifying an abnormal (i...

متن کامل

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification a...

متن کامل

Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

In the classification framework there are problems in which the number of examples per class is not equitably distributed, formerly known as imbalanced data sets. This situation is a handicap when trying to identify the minority classes, as the learning algorithms are not usually adapted to such characteristics. An usual approach to deal with the problem of imbalanced data sets is the use of a ...

متن کامل

Fast Perceptron Decision Tree Learning from Evolving Data Streams

Mining of data streams must balance three evaluation dimensions: accuracy, time and memory. Excellent accuracy on data streams has been obtained with Naive Bayes Hoeffding Trees—Hoeffding Trees with naive Bayes models at the leaf nodes—albeit with increased runtime compared to standard Hoeffding Trees. In this paper, we show that runtime can be reduced by replacing naive Bayes with perceptron c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data Mining and Knowledge Discovery

سال: 2021

ISSN: ['1573-756X', '1384-5810']

DOI: https://doi.org/10.1007/s10618-021-00786-0